Escape sequences can be used to designate a single character literal or characters in a string literal. Escape sequences followed by a character
code are especially handy when the character is non-visible or is not directly representable in the source code encoding. However, before C++23 the
way to write them is convoluted. Let’s take, for instance, the character @
:
- In octal, it is written
\100
. This notation has two main issues: It’s not easy to spot octal literals (it’s just some digits,
nothing makes it obvious that they are octal digits), and it’s not easy to say when the escape sequence ends (see S2335). This
is so confusing that it is commonly advised to avoid octal escape sequences (see S1314).
- Hexadecimal sequences are written
\x40
. While they stand out more prominently, it’s still difficult to say when they end (see also
S2335).
- Universal character names, while not strictly speaking escape sequences, have a very similar visual appearance and are written
\u0040
(exactly 4 hexadecimal digits) or \U00000040
(exactly 8 hexadecimal digits). The fact that the number of digits
depends on the case of the introducing u
can be a source of errors.
C++23 introduces a new way to define those escape sequences, which is more consistent, non-ambiguous, and makes the ending of the literal obvious.
It is now possible to write the previous examples respectively \o{100}
, \x{40}
, \u{40}
and \u{40}
again (leading zeros are optional).
This rule raises an issue when a legacy escape sequence can be replaced with an improved version.